Multichannel high noise level ECG denoising based on adversarial deep learning

This paper proposes a denoising method based on an adversarial deep learning approach for the post-processing of multi-channel fetal electrocardiogram (ECG) signals. As it’s well known, noise leads to misinterpretations of fetal ECG signals and thus limits the use of fetal electrocardiography for healthcare applications. Therefore, denoising algorithms are essential for the exploitation of non-invasive fetal ECG. The proposed method is based on the combination of three end-to-end trained sub-networks to convert noisy fetal ECG signals into clean signals. The first two sub-networks are linked by skip connections and form a deep convolutional network that downsamples the noisy signals into a latent representation and subsequently upsamples this latent representation to recover clean signals. The third sub-network aims to boost the decoder sub-network to generate realistic clean signals. Experiments carried out on synthetic and real data showed that the proposed method improved by the signal-to-noise (SNR) of fetal ECG signals with input SNR ranging from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,30$$\end{document}-30 to 0 dB by an average of 20 dB, and improve fetal signal quality by significantly increasing the number of true detected QRS complexes and halving QRS complex detection errors.


Methodology
Proposed approach Our approach aims to train, in a supervised way, a neural network that denoise highly corrupted fetal ECG signals using a dataset of paired clean and noisy fetal ECG signals.More formally, this problem can be stated as follows: Given a paired signal training dataset (x, x) 1 , . . ., (x, x) N , where x represented the noisy version of the clean fetal ECG signal x for a given sample, our goal is to learn the right parameter values of a neural network so that the network can map the noisy signal x to the clean signal x.With these optimized parameters, the neural network can be used to denoise all fetal ECG signals, including those not available in the training database.
Figure 1 illustrated the overview of the proposed approach at training and inference time.Three networks involve in training.The first two sub-networks are the encoder and the decoder networks, respectively f E and f D .They are highlighted in blue and yellow respectively in Fig. 1 and are components of the final network f used to denoise the signals at inference time.The last one, highlighted in green, is the discrimnator network ( f disc ).
The encoder network ( f E ) is composed of eight dilated convolutional layers.Each one performs a sequence of three operations.These operations are a 1D convolution, a leaky rectified linear unit (leakyReLU) activation function, and an instance normalization operations.The encoder downscales the input noisy fetal ECG signal x , compressing it to a vector of lower dimensions ẑ , hypothesized to live in space of the best features of x and thus referred to as the latent vector of x.That means by mapping x to ẑ , f E only retains useful features of the signal and thus ignores useless features as noise.The decoder network upscales the latent representation ẑ to a signal x with the same dimension as x for consistent comparison.It consists of nine layers.The first eight perform a sequence of three operations each.These operations are the same as those performed in the encoder layer, except that the dilated convolution is replaced by a transposed convolution operation.The last layer of the decoder is a regular convolution layer performing a convolution operation with no activation function or normalization operation.The discriminator network ( f disc ) is composed of four layers, each ones performing a regular linear operation followed by a rectified linear unit (ReLU) operation.except the last one in which .Its role is to distinct www.nature.com/scientificreports/ the x signal generated by the decoder from the clean signal x.The f disc network is the part responsible for the adversarial learning in the proposed approach.Figure 2 depicts the architectures of the encoder, decoder and the discriminator networks.This figure shows that every two convolutional and mirrored transposed convolutional layers are linked with dashed arrows.These dashed arrows are known as "skip" connections and help to deal with the gradient vanish problem occurring in deep architectures 59 such as the one proposed.The input dimensionality of features at each layer is also presented in Fig. 2, and a more detailed view with the output dimensionality, the number of trainable parameters and all the variables taken into account by the different operations at each layer is shown in Table 1.
During training, the encoder learns to retain the useful features of a noisy input signal x by mapping it to a much lower-dimensional vector ẑ = f E (x) .In addition, the encoder is again used to map the clean signals x to the corresponding latent lower-dimensional vector z = f E (x) .Since z and ẑ are of equal dimension, they can be used for consistent comparison, thus enhancing the encoder's ability to eliminate unnecessary features when encoding noisy signals.This second use of the encoder is one of the unique aspects of the proposed approach.The decoder takes the latent vector ẑ as input and learns to generate a signal x = f D (ẑ) as close as possible to the clean signal x.The generated signal and the clean signal are then fed into the discriminator which, by discriminating between the generated signal and the clean signal, helps the decoder to generate a better signal which finally fools the discriminator.The loss functions for training the respective networks are presented in the next section.
The convolution operation in the encoder and the decoder networks uses a dilation factor to increase the perceptual field of our networks so that the temporal structure of the noisy input ECG signal can be captured at multiple scales.In addition, the encoder network is designed to handle a four-channel ECG signal, enabling the spatial information provided by each channel to be exploited.In this way, spatio-temporal information is exploited to eliminate noise and provide clean signals while preserving the morphology of the individual ECG components.
Based on the above, after training, the encoder and decoder networks can be used to design the final filter f defined as f = f D • f E for denoising unseen noisy fetal ECG.

Objective function and training parameters
To train the networks parameters of the proposed approach, we formulate an objective function by combining three loss functions, each optimizing the individual sub-networks of the proposed approach.
The first is a contractive denoising auto-encoder loss L ctr introduced in Refs. 60,61and used in Ref. 62 for ECG filtering purposes.It is formed by the addition of two terms, a classical denoising auto-encoder loss L rec com- puting the mean squared error (MSE) between the clean fetal ECG and the denoised fetal ECG, and a Frobenius norm of Jacobian matrix computed from the noisy input fetal ECG.Hence, the contractive loss L ctr is defined as follows: www.nature.com/scientificreports/and, The next is responsible for adversarial learning introduced in generative adversarial networks (GANs) 63 .The GAN framework commonly comprises two sub-networks, a generator network and a discriminator, with competing objectives 63 .They have achieved considerable success in image processing and are still being explored for time series applications such as time classification and synthesis [64][65][66][67][68] .However, it has been shown that many GAN-based algorithms trained with classical adversarial loss may not converge 63,69 .To avoid this problem and continue benefit from the advantages offered by GANs, several techniques for computing loss functions have been introduced to encourage GAN-based algorithms convergence [70][71][72] .We use the one called feature matching (2) Discriminator Linear(in = 7680, out = 1920) s and d stand respectively for stride and dilation parameters, IN stands for Instance Normalization proposed in Ref. 70 , which uses the internal representation of the discriminator network to update the generator parameters.In our reformulation of the GAN framework, the decoder network plays the role of the generator network, hence the adversarial loss L adv is defined as the MSE between the feature representation of the clean fetal ECG and the generated fetal ECG, respectively: where f disc is a function that outputs an intermediate layer of the discriminator network.
The L ctr and L adv introduced above encourage the neural network f, especially the decoder part f D to produce realistic and similar signals to clean fetal ECG.This work can be done only under the condition that the encoder part produces a vector ẑ capturing the best representations of x.To enforce that, we introduced an additional loss function L enc defined as the MSE between the latent vector of the clean fetal signal and the latent vector of the noisy fetal signal: Overall, the objective function to train the proposed approach is defined as the weighted average of the above loss functions: where w enc , w adv , w rec , and w ω are the hyper-parameters adjusting the contribution of individual losses to the overall objective function.Since each fetal signal has four channels, the overall objective is computed along each channel and then averaged.
In our experiments, we organized signals in batches of size B = 8 and used as optimization algorithm the variant of the Adam algorithm proposed in Ref. 73 with weight decay and learning rate parameters set to 5.10 −2 and 10 −5 , respectively.We empirically found that setting the weighting parameters w enc , w adv , w rec , w ω to 4, 10 −2 , 25, 10 −4 , respectively, allowed fast convergence.The network was implemented in Pytorch and trained for 20 epochs on a 64-bit Intel Core i7 processor with 8 GB of RAM.

Data for training and evaluation
Two datasets were used for training purposes and to evaluate the proposed method, as well as the other methods chosen for benchmarking.The first one is a synthetic fetal ECG dataset created by employing the open-source fecgsyn toolkit developed in Ref. 74 .The fecgsyn toolbox is used to create a dataset suitable for network training by dividing each sample in the dataset into an abdominal noisy signal and a fetal ECG clean signal of thirty-two channels each.Using the fecgsyn toolbox, we simulate different physiological event cases with different noise levels ranging from − 12 to 12 dB and from − 30 to 0 dB respectively, for training and evaluation purposes.The physiological event cases considered are described in Table 2, they are similar to events simulated in the Fetal ECG Synthetic Database 74,75 except that we excluded the case of twin pregnancy as the proposed model was not developed to handle this case.Each simulation was run, for statistical purposes, five times independently using a five-minute signal at a sampling rate of 250 Hz.
The second one is an open-access dataset called the Abdominal and Direct Fetal Electrocardiogram Database 76 .It consists of real multi-lead abdominal fetal ECG signals obtained from five women in labor, between 38 and 41 weeks of gestation.Each signal has been recorded with 16-bit resolution at a sampling rate of 1000 Hz for five minutes and was processed by digital filters to eliminate baseline and power-line inference 76 .For each record, four-channel abdominal mixtures and the corresponding direct fetal ECG signals are given.

Data preprocessing
Before applying the proposed algorithm to data, certain preprocessing steps are necessary.Firstly, using the method based on the Extended Kalman Filtering 77 , the cancellation of the maternal ECG was performed for the abdominal part of all recordings in the synthetic and real databases.
Then, for each thirty-two channel signal in the synthetic dataset, eight channels were selected to form two four-channel signals.The channels were selected according to recommendations given in Ref. 74 .All channels were taken into account in the real database since there are only four.Finally, all the fetal ECG signals were resampled (4) (5) www.nature.com/scientificreports/ to 500 Hz to have a common frequency and they were divided into sequences of 4 × 1920 samples.The signals were also standardized along each channel to have a mean of zero and a unit standard deviation.

Performance metrics
In this study, two sets of performance measures were used to evaluate the proposed approach.The first set measures the divergence between the signal output by the proposed method and the ground truth signal.These measures were only applied to the synthetic dataset, since for each noisy fetal ECG in the synthetic dataset, its clean version is available.These metrics include the signal-to-noise ratio improvement (SNR imp ), the root meansquare error (RMSE) and the percent-root distortion (PRD).SNR imp measures the difference of SNR between a denoised signal and the corresponding noisy input signal.A higher value of SNR imp indicates better denoising performance.It can be expressed for a channel, c, of a signal as follows: where SNR in and SNR out are defined as: RMSE measures the variance between the denoised signal and the corresponding clean signal.A lower value of RMSE corresponds to a smaller difference and is desired as it indicates better performance.For a channel, c, of a signal, RMSE is defined as: PRD can be used as an indicator of the recovery quality of the compressed signal.A lower value of PRD indicates a better quality of the denoised signal.For a channel, c, of a signal, it is defined as: These metrics are computed for each channel and then averaged.
In a dataset containing the real fetal ECG, the above metrics are useless since there are no clean fetal ECG signals that can be used as ground truth signals.However, the dataset provides the abdominal recording with the simultaneously recorded scalp ECG annotated by the experts 76 .Although recorded directly from the fetal head, scalp measurements contain a considerable amount of noise.Moreover, they are recorded on a different lead than the abdominal leads; thus, even in the case of perfect denoising, the denoised signals and the scalp signals can not be matched.Nevertheless, the manifestations of major cardiac electrical events such as ventricular depolarization should coincide and be aligned in the ECG signals from the abdomen and scalp.It is therefore possible to base the evaluation of the proposed method on the correctness of detecting the QRS complexes.The calculation of the correctly detected QRS complexes is performed with a tolerance of ± 50 milliseconds of the scalp R-peak location 78 .Based on correct and incorrect detection, the following metrics can be defined: where TP is True Positive, i.e. the correctly detected QRS complexes, FN is False Negative, i.e. the undetected QRS complexes; and FP is False Positive, i.e. the incorrectly detected QRS complexes.PPV measures the accuracy of the detection algorithm by evaluating the ratio of detections of true-positive QRS complexes to the total number of QRS complexes detected; SE measures the ability of the detection algorithm to detect true-positive QRS complexes relative to the total number of QRS complexes annotated, and F 1 is the harmonic mean of sensitivity and accuracy, providing a balanced measure of the performance of the detection algorithm.These metrics are widely used to evaluate QRS complex detection and higher values of them are indicators of better performance denoising 48,51 .The first comparative method is wavelet-based.The wavelet transform is widely studied and used for ECG signal analysis because, by expanding the signal in terms of a localized wavelet function in both time and frequency, it provides good time resolution at high frequency and good frequency resolution at low frequency 79 .Wavelet-based denoising methods consist of three main steps consisting of decomposing the signal into coefficients, comparing these coefficients with a certain threshold, and reconstructing the signal with the thresholded coefficients.We selected the sixth-order symlet wavelet as the mother wavelet because it works well on ECG noise 80 and the sureshrink method was used for thresholding 81,82 .
The next comparative method is a supervised learning-based method.It is a denoising convolutional autoencoder (FCN-DAE) proposed in Ref. 83 for noise reduction in ECG signals.The architecture of FCN-DAE has three main parts; the encoder part and decoder part using convolution and deconvolution operations together with batch normalization and exponential linear unit (ELU) as activation function, respectively 84 .The last part is the output layer performing only a deconvolution operation.A more exhaustive description can be found in Ref. 83 .It is important to note that FCN-DAE was originally proposed for single-channel ECG filtering 83 , so we slightly modified it to adapt it to four-channel signals configuration.
The last method is the supervised learning-based method designed in 57 , especially for denoising multichannel fetal ECG.It is a deep convolutional neural network (DCNN) consisting of an encoder of eight convolutional layers symmetrically connected with eight transposed convolutional layers of a decoder.The encoder's layers perform convolution operations such that the signal is downsampled by two after each layer, while the decoder's layers perform transposed convolutions that upsample the signal by two after each layer.The leaky rectified linear units with a slope of 0.2 are used as a non-linearity operation at each layer.A more exhaustive description of this method can be found in Ref. 57 .
For a fair comparative study, it is important to note that the comparative denoising methods and the proposed method do not have the same complexity.The calibration of wavelet-based methods relies solely on the choice of wavelet family and order, and the choice of threshold, and is done through a trial-and-error process, usually in a short time.Methods based on deep learning require more parameters, more computational resources and more time to train these parameters.The proposed model uses 49,229,492 parameters to estimate the denoised signal, while the DCNN model uses 93,649,796 parameters, just under twice as many, for the same task.It should be noted, however, that training of the proposed model is more complex due to the dual use of the encoder and the presence of the discriminator (Fig. 2).The FCN-DAE model is the simplest of the deep learning models used in this study.It has 86,211 parameters and uses no residual connection like the other two.

Evaluation on synthetic dataset
The proposed method and the existing methods were evaluated on the synthetic dataset, and the results are presented in Fig. 3. Figure 3 illustrates the improvement in SNR, the RMSE and the PRD for input SNR ranging from − 30 to 0 dB.It can be observed that the proposed method outperforms the existing methods by provid- ing the highest values of SNR imp and the lowest values of RMSE and PRD throughout the whole range of input SNR .More explicitly, for input SNR between − 10 and 0 dB the proposed method and the DCNN method 57 have comparable performances, while the performances of the FCN-DAE method 83 and the wavelet denoising method gradually become similar.As the input SNR decreases, the DCNN and wavelet denoising methods perform less www.nature.com/scientificreports/well, rendering them useless in cases of very low-quality signals, while the FCN-DAE method still performs well but not as well as the proposed method.Figure 3 shows that the wavelet denoising method gives the worst results.This is not surprising because, in the presence of heavily noisy fetal ECG signals, the wavelet denoising method is not able to preserve individual variations between ECG complexes and is inclined to distort the signal amplitude, whereas the proposed and existing learning-based methods, DCNN and FCN-DAE, achieve better results.Figure 4 illustrates this phenomenon for a typical signal from our synthetic test dataset.The values of performance metrics before and after denoising are provided in Table 3.It can be seen that the signal output by the proposed method is clearer and closer to the ground truth signal than the signal output by other denoising methods.For this particular example, the SNR result of the proposed method is around 5 dB higher than the SNR result of the second best method, FCN-DAE.
These results clearly show that the proposed method significantly improves the quality of the noisy fetal ECG signals.It preserves morphology, amplitude, and variations among individual fetal ECG complexes, even when most signal channels are severely corrupted.This result is not surprising, and is explained by the loss function (Rq.5), which explicitly forces the network to capture the best representations of the fetal ECG signal by examining both the noisy fetal ECG and the clean fetal ECG.Moreover, the multi-lead configuration of the input signal enables the network to capture spatio-temporal information in the fetal ECG.These information are beneficial in cases of severely corrupted signal channels since the network has sufficient features to recover each channel.

Evaluation of real dataset
As previously stated, the denoising performance of the proposed and existing methods cannot be directly measured in the case of real fetal ECG signals, as there are no clean reference signals.Instead, we examine the performance of QRS complexes detection facilitation.To quantify the above performance, we applied the denoising methods and then the Hamilton peak detector to the signal extracted from the r10 recording of the real fetal ECG dataset.Only the r10 recording was selected from the five available because, after applying the extraction method, others were too noisy to allow denoising and peak detection.
Figure 5 illustrates a comparison of QRS complexes detection on a piece of extracted noisy fetal ECG and on the denoised signals after the application of the proposed and existing methods.A corresponding piece of  3. Using the QRS complexes detected on the denoised signals by the Hamilton peak detector and those of the scalp ECG provided by the database, we use the performance metrics described by the equations Eqs. ( 12), ( 13), (14) to quantitatively evaluate the methods.These metrics are calculated for each channel and for all samples in record r10 in the real dataset.Figure 6 shows the average values of these metrics, while Table 4 depicts in more detail the values of the performance metrics for each channel.
These results indicate that the number of true detected QRS complexes (TP) and the QRS complexes detection errors (FP and FN) are respectively increased and reduced over the signal output by the proposed method and the existing methods, with the exception of the DCNN method.The proposed method performs significantly better than the others, achieving the best scores in all performance measures for QRS complex detection with significant difference scores (around 29% , 18% , 26% difference in PPV, SE and F 1 respectively) compared to the second best method, the FCN-DAE.The proposed denoising method considerably increases the number of true detected QRS complexes (TP) (over 400 complexes on average) and reduces, on average, the detection errors (FP and FN) by around 50%.

Figure 1 .
Figure 1.Training and inference schemes of the proposed approach for fetal ECG denoising.

with ( 1 )Figure 2 .
Figure 2. The Encoder, Decoder, and Discriminator architectures of the proposed approach.

Figure 3 .
Figure 3. Performance of the proposed network in comparison with the existing methods in terms of (a) the improvement in signal-to-noise ratio (SNR imp ), (b) the root mean-square-error (RMSE), and (c) the percent root distortion (PRD).Each metric was computed along each channel and then averaged.

Figure 4 .
Figure 4. Qualitative comparison of denoising a synthetic signal from the test dataset by the proposed and existing denoising methods.(a) the noisy input fetal ECG, (b) the ground truth fetal ECG.The signal produced by (c) the proposed method, (d) the DCNN method, (e) the FCN-DAE method, (f) the wavelet method.The SNR values before and after applying denoising methods are given in Table3.

Figure 5 .
Figure 5. Qualitative comparison of the peak detection over the proposed and existing denoising output samples.In (a) the fetal scalp ECG, (b) the noisy input fetal ECG.The output signal by (c) the proposed method, (d) the DCNN method, (e) the FCN-DAE method, (f) the wavelet method.Only the first channels of the output signal are displayed for better visualization.

Figure 6 .
Figure 6.Performance of the proposed method and the existing methods in case of real fetal ECG signals.The considered metrics are (a) the positif predicted value (PPV), (b) the sensitivity (SE), and (c) the F 1 score.Each metric was computed along each channel and then averaged.

Table 1 .
Detailed description of the proposed networks architectures.

Table 2 .
Description of physiological events of the synthetic signals of synthetic dataset.

Table 3 .
Quantitative comparison in terms of SNR of the signal depicted in Fig.4, before ( SNR in ) and after ( SNR out ) denoising.ECG is also provided.The red dots in the plots indicate the positions of the QRS complexes detected by the Hamilton peak detector, with the exception of those in the scalp ECG signal, which are provided by the database.For reasons of simplicity and visualization, only the first channels of the denoised signals are shown.

Table 4 .
QRS complexes detection performance based on Hamilton Peak Detector.